Student Performance Charts

Using the "Students Performance in Exams" dataset, you can answer a variety of questions that explore the relationships between student demographics, socio-economic factors, and academic performance. Here are some key questions that can be addressed with this dataset:

Performance Analysis

  • Gender and Performance: How do average scores in math, reading, and writing differ between male and female students? Are there significant differences in performance between genders across different subjects?

  • Parental Education and Performance: Is there a correlation between the parental level of education and student performance in math, reading, and writing? Do students whose parents have higher education levels perform better on average?

  • Test Preparation Course: How does completing a test preparation course impact student scores in math, reading, and writing? What percentage of students who completed the test preparation course scored above a certain threshold?

  • Lunch Type: Does the type of lunch (standard vs. free/reduced) affect student performance? How do average scores compare between students who receive standard lunch and those who receive free/reduced lunch?

Comparative Analysis

  • Race/Ethnicity and Performance: How do average scores vary across different race/ethnicity groups? Are there notable performance gaps between different racial/ethnic groups?

  • Subject-wise Performance: How do students' performances in math, reading, and writing compare? Are there students who excel in one subject but perform poorly in another?

Correlation and Distribution

  • Score Correlation: Is there a correlation between math scores and reading scores? Between reading scores and writing scores? What are the strongest predictors of student performance in each subject?

  • Score Distribution: What is the distribution of scores in math, reading, and writing? Are there any outliers or unusual patterns in the score distributions?

Socio-economic Factors

  • Impact of Socio-economic Status: How does socio-economic status, inferred from parental education level and lunch type, impact student performance? Are there significant differences in performance based on socio-economic status?

Other Explorations

  • Combined Factors: How do multiple factors (e.g., gender, parental education, and test preparation) together affect student performance? Can we create a predictive model for student performance based on these factors?

Let’s start with Loading the Student Performance Data.

In [61]:
import pandas as pd
import altair as alt

# Load the dataset
data = pd.read_csv("exams.csv")
data.head()
Out[61]:
gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score
0 male group A high school standard completed 67 67 63
1 female group D some high school free/reduced none 40 59 55
2 male group E some college free/reduced none 59 60 50
3 male group B high school standard none 77 78 68
4 male group E associate's degree standard completed 78 73 68

Scatter Plot of Math vs. Reading Scores by Gender

In this section, we will explore the relationship between math and reading scores, segmented by gender. The scatter plot will allow us to visualize how male and female students perform in both subjects, helping to identify any trends or disparities in their scores.

The analysis aims to answer the following questions:

  • How do math and reading scores correlate for male and female students?
  • Are there significant differences in performance between genders in each subject?

Let’s proceed with creating the scatter plot.

In [62]:
# Implementing selection
selection = alt.selection(type='multi', fields=['gender'])

alt.Chart(data).mark_circle().encode(
    x='math score:Q',
    y='reading score:Q',
    color=alt.Color('gender:N', scale=alt.Scale(scheme='category10')),
    tooltip=['gender', 'math score', 'reading score'],
    opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
).add_selection(selection).properties(
    title='Scatter Plot of Math vs. Reading Scores by Gender'
)
Out[62]:

Interactive Scatter Plot of Math vs. Reading Scores by Gender

In the interactive scatter plot above, we can observe that boys tend to perform better in math, while girls excel in reading. You can click on the blue or orange points to highlight specific groups of students, allowing for a more focused analysis of their performance.

Next, let's proceed to create the interactive scatter plot that will further enhance our understanding of these trends.

In [63]:
# Implement Exploration (Pan and Zoom) with a title
chart = alt.Chart(data).mark_circle().encode(
    x='math score:Q',
    y='reading score:Q',
    color=alt.Color('gender:N', scale=alt.Scale(scheme='category10')),
    tooltip=['gender', 'math score', 'reading score']
).interactive().properties(
    title='Interactive Scatter Plot of Math vs. Reading Scores by Gender'
)

chart
Out[63]:

Interactive Scatter Plot of Math vs. Reading Scores by Gender

In the interactive scatter plot above, you can zoom in to adjust the scale as needed. This feature allows for a more detailed examination of the relationship between math and reading scores based on gender.

In [64]:
#Implement Abstract/Elaborate:
# Abstract/Elaborate with semantic zoom
selection = alt.selection(type='multi', fields=['gender'])

# Overview chart
overview = alt.Chart(data).mark_bar().encode(
    y='count()',
    x='gender:N',
    color=alt.condition(selection, alt.value("orange"), alt.value("lightgrey"))
).add_selection(selection)

# Detail chart
detail = alt.Chart(data).mark_circle().encode(
    y='reading score:Q',
    x='math score:Q',
    color=alt.Color('gender:N', scale=alt.Scale(scheme='category10')),
    tooltip=['gender', 'math score', 'reading score']
).transform_filter(selection).properties(
    title='Gender-Based Overview of Student Performance'
)

overview | detail
Out[64]:

Gender-Based Overview Performance

In this section, you can interactively select gender in the scatter plot. Click on the count of records on the left side to highlight the data points you wish to focus on. You can choose to view one gender or both by clicking on the respective counts.

In [65]:
#Implement Filtering:
# Bind selection to legend
selection = alt.selection(type='multi', fields=['gender'], bind='legend')

alt.Chart(data).mark_circle().encode(
    x='math score:Q',
    y='reading score:Q',
    color=alt.Color('gender:N', scale=alt.Scale(scheme='category10')),
    tooltip=['gender', 'math score', 'reading score'],
    opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
).add_selection(selection).properties(
    title='Interactive Filtering: Math vs. Reading Scores by Gender'
)
Out[65]:

Interactive Filtering: Math vs. Reading Scores by Gender

Click on the legend labeled "Gender" to select the information you wish to highlight in the scatter plot. This allows you to focus on specific data points based on gender.

In [66]:
#implement encoding
import altair as alt

# Assuming 'data' is your DataFrame loaded with the student performance data

dropdown = alt.binding_select(options=['math score', 'reading score', 'writing score'], name='Select a score:')
selection = alt.selection_single(fields=['Score'], bind=dropdown, init={'Score': 'math score'})

alt.Chart(data).transform_fold(
    ['math score', 'reading score', 'writing score'],
    as_=['Score', 'Value']
).transform_filter(
    selection
).mark_circle().encode(
    x='Value:Q',
    y='Score:N',
    color=alt.Color('gender:N', scale=alt.Scale(scheme='category10')),
    tooltip=['gender:N', 'Value:Q', 'Score:N']
).add_selection(selection).properties(
    width=600,
    height=400,
    title='Dynamic Score Selection: Comparing Math, Reading, and Writing Scores by Gender'
)
Out[66]:
In [67]:
# Transform the data to long format
# Create the dropdown selection for lunch
dropdown = alt.binding_select(options=data['lunch'].unique(), name='Select Lunch Type:')
selection = alt.selection_single(fields=['lunch'], bind=dropdown, init={'lunch': data['lunch'].unique()[0]})

# Transform the data to long format for scores
data_long = data.melt(id_vars=['gender', 'lunch'], value_vars=['math score', 'reading score', 'writing score'],
                      var_name='Score', value_name='Value')

# Create the chart with data transformation
chart = alt.Chart(data_long).transform_filter(
    selection
).mark_circle().encode(
    x='Value:Q',
    y='Score:N',
    color=alt.Color('gender:N', scale=alt.Scale(scheme='category10')),
    tooltip=['gender', 'Value', 'Score']
).add_selection(selection).properties(
    title="Impact of Lunch Type on Student Performance Across Subjects"
)

chart
Out[67]:
In [68]:
# Load the dataset
data = pd.read_csv("exams.csv")

# Melt the data to long format for easier plotting
data_long = data.melt(id_vars=['gender', 'race/ethnicity', 'parental level of education', 'lunch', 'test preparation course'],
                      value_vars=['math score', 'reading score', 'writing score'],
                      var_name='subject', value_name='score')

# Create an animated scatter plot
# Use a selection with radio buttons to simulate animation
input_dropdown = alt.binding_radio(
    options=['math score', 'reading score', 'writing score'],
    name='Select Subject:',
)
selection = alt.selection_single(
    fields=['subject'],
    bind=input_dropdown,
    name="subject_selection",
    init={'subject': 'math score'}
)

# Create the chart
chart = alt.Chart(data_long).mark_circle().encode(
    x='score:Q',
    y=alt.Y('subject:N', title=''),
    color='gender:N',
    tooltip=['gender', 'score', 'subject']
).transform_filter(
    selection
).add_selection(
    selection
).properties(
    width=600,
    height=400,
    title='Student Performance Across Subjects'
)

chart
Out[68]:
In [69]:
import altair as alt
import pandas as pd

# Load the dataset
data = pd.read_csv("exams.csv")

# Melt the data to long format for easier plotting
data_long = data.melt(id_vars=['gender', 'race/ethnicity', 'parental level of education', 'lunch', 'test preparation course'],
                      value_vars=['math score', 'reading score', 'writing score'],
                      var_name='subject', value_name='score')

# Create a dropdown selection for subjects
dropdown = alt.binding_select(options=['math score', 'reading score', 'writing score'], name='Select Subject:')
selection = alt.selection_single(fields=['subject'], bind=dropdown, init={'subject': 'math score'})

# Create the heatmap
heatmap = alt.Chart(data_long).transform_filter(
    selection
).mark_rect().encode(
    x='race/ethnicity:N',
    y='parental level of education:N',
    color='mean(score):Q',
    tooltip=['mean(score):Q']
).properties(
    width=600,
    height=400,
    title="Mean Student Scores by Race/Ethnicity and Parental Education Level"
).add_selection(
    selection
)

heatmap
Out[69]:
In [70]:
import pandas as pd
import altair as alt

# Load the dataset
data = pd.read_csv("exams.csv")

# Melt the data to long format for easier plotting
data_long = data.melt(id_vars=['gender', 'race/ethnicity', 'parental level of education', 'lunch', 'test preparation course'],
                      value_vars=['math score', 'reading score', 'writing score'],
                      var_name='subject', value_name='score')

# Create a dropdown selection for the grouping variable
grouping_options = ['gender', 'race/ethnicity', 'parental level of education', 'lunch', 'test preparation course']
dropdown = alt.binding_select(options=grouping_options, name='Group by:')
selection = alt.selection_single(fields=['group'], bind=dropdown, init={'group': 'gender'}, name='selector')

# Create a base chart
base = alt.Chart(data_long).transform_calculate(
    group='datum[selector.group]'  # This will dynamically use the selected group
).mark_rect().encode(
    x='subject:N',
    y=alt.Y('group:O', title='Group', axis=alt.Axis(labelLimit=300)),  # Increase the label limit
    color='mean(score):Q',
    tooltip=['mean(score):Q']
).properties(
    width=600,
    height=400,
    title="Mean Student Scores by Grouping Variable",
    padding={"left": 150, "right": 50, "top": 10, "bottom": 30}  # Add padding to the left
).add_selection(
    selection
)

# Show the chart
base

# Save to HTML
base.save('heatmap_interactive.html')

Heatmap Interactive Visualization

Please refer to the interactive heatmap visualization of the student performance data by clicking here. The chart allows you to select different grouping variables to compare mean scores across different subjects.

Simply click the link to interact with the visualization.

Conclusion

Based on the analysis of the "Students Performance in Exams" dataset, several key insights have been identified regarding the relationships between student demographics, socio-economic factors, and academic performance. These insights can help educators and policymakers understand and address the factors influencing student outcomes.

Gender and Performance

Reading Scores: Female students consistently outperform male students in reading. This trend is evident across all levels of analysis, indicating that girls have a stronger proficiency in reading.

Math Scores: Conversely, male students tend to perform better in math compared to female students. This suggests a gender disparity in performance that may warrant further investigation to understand underlying causes and address potential educational gaps.

Race/Ethnicity and Performance

Group E: Among the various race/ethnicity groups, students belonging to Group E show the highest average scores in both math and reading. This group's performance stands out, highlighting potential best practices or factors that could be emulated to improve performance in other groups.

Parental Education and Performance

Higher Education Levels: Students whose parents have attained higher education levels, particularly those with Bachelor's or Master's degrees, tend to achieve better scores in all subjects (math, reading, and writing). This correlation underscores the importance of parental education in influencing student academic success and suggests that educational interventions may benefit from involving and educating parents.

Lunch Type and Performance

Standard Lunch: Students who receive standard lunch tend to have higher scores compared to those who receive free or reduced lunch. However, it is important to note that this comparison might not fully capture the impact of lunch type, as we lack data on the students' performance before and after receiving free or reduced lunch. Further longitudinal studies are needed to determine the causal effects of lunch programs on academic performance.

Test Preparation Course and Performance

Positive Impact of Test Prep: Completing a test preparation course is associated with better scores in math, reading, and writing. This finding highlights the value of targeted test preparation in enhancing student performance and suggests that expanding access to such resources could be beneficial.

Recommendations

  1. Targeted Interventions for Gender Disparities: Develop and implement programs aimed at supporting male students in reading and female students in math to address gender disparities.

  2. Best Practices from Group E: Investigate and replicate the strategies or conditions that contribute to the success of Group E in other racial/ethnic groups.

  3. Parental Involvement: Encourage and facilitate parental involvement in education, especially for parents with lower educational attainment, to boost student performance.

  4. Comprehensive Lunch Program Evaluation: Conduct longitudinal studies to evaluate the impact of free and reduced lunch programs on academic performance, ensuring that comparisons account for changes over time.

  5. Expand Test Preparation Resources: Increase accessibility to test preparation courses, particularly for students from socio-economically disadvantaged backgrounds, to help level the playing field.

These conclusions and recommendations provide a foundation for informed decision-making and strategic planning aimed at improving educational outcomes for all students.

", "text/plain": "alt.Chart(...)"}, "metadata": {}, "output_type": "display_data"}]}}, "6cf8bda6312049dc884a68ce08302e40": {"model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {}}, "75aae03d908b4e5594f8eb0a8aa660dd": {"model_module": "@jupyter-widgets/output", "model_module_version": "1.0.0", "model_name": "OutputModel", "state": {"layout": "IPY_MODEL_3f524e7fb43849f19e12e29e964b44d6", "outputs": [{"data": {"text/html": "\n
\n", "text/plain": "alt.Chart(...)"}, "metadata": {}, "output_type": "display_data"}]}}, "7d5e09be88a7421f998eca8cb802f0dc": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DropdownModel", "state": {"_options_labels": ["gender", "race/ethnicity", "parental level of education", "lunch", "test preparation course"], "description": "Group by:", "index": 1, "layout": "IPY_MODEL_e6966f9bad004bc795b163674a4413b3", "style": "IPY_MODEL_61f471d4718b4e36ab6fc324dc9428cb"}}, "7e1636bfe3284be9b504da3f84a231b4": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "VBoxModel", "state": {"_dom_classes": ["widget-interact"], "children": ["IPY_MODEL_7d5e09be88a7421f998eca8cb802f0dc", "IPY_MODEL_66fdf44803a34b76b31cfd7204bf3af7"], "layout": "IPY_MODEL_44327374eeec44f490740d7ef0ab73a1"}}, "9cd4f101c09f4d2d8a3ff7d9345b2a86": {"model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {}}, "a8daebb6ba834990a383849eec37e6f1": {"model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DropdownModel", "state": {"_options_labels": ["gender", "race/ethnicity", "parental level of education", "lunch", "test preparation course"], "description": "Group by:", "index": 0, "layout": "IPY_MODEL_9cd4f101c09f4d2d8a3ff7d9345b2a86", "style": "IPY_MODEL_3d2b27f9f8cb436ab19ae5d8be6bf424"}}, "b9dd435d871549998af5ea1ba28ad252": {"model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {}}, "e6966f9bad004bc795b163674a4413b3": {"model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": {}}}, "version_major": 2, "version_minor": 0}